Add support for git commit verification#3905
Conversation
zhming0
left a comment
There was a problem hiding this comment.
I see the value in this change, but I left some questions re overall directions and the cost / gain asymmetry. Also I recommend relocating some codes to another file.
| if e.Tag != "" { | ||
| return nil | ||
| } | ||
|
|
||
| // Skip if this is a PR build — the commit may be on a merge ref, not the target branch | ||
| if e.PullRequest != "" { | ||
| return nil | ||
| } | ||
|
|
||
| // Skip if a custom refspec is set — the fetch may not populate standard branch refs, | ||
| // making ancestry verification unreliable | ||
| if e.RefSpec != "" { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
While I totally see the value of this change adding a level of protection, I worry that it's a bit narrow given all these bypass conditions.
Also on the direction, it's ultimately saying: "under X case, do not trust Buildkite backend". It's a questionable direction for us to invest in though we have prior arts (I think). But I do wonder if we could/should strengthen the backend to not send false information.
It's not strictly a blocker, but something worth a number of words to explain about.
There was a problem hiding this comment.
"under X case, do not trust Buildkite backend"
There's plenty of prior art for client-side protections: pre-bootstrap hook for job validation, Signed Pipelines, allowed-env-vars flag, allowed-plugins flag, etc.
In the original Buildkite model, where the backend doesn't have access to the code, the backend can't determine whether or not a particular commit is on a particular branch. I'm not sure how we would strengthen the backend to send only commits on the intended branch without requiring code access.
There was a problem hiding this comment.
I agree with Josh about the client-side protections. There's no way for us to verify that a commit is or isn't on the claimed branch without having a checkout of the repo. Theoretically there are ways that we could do this with the GitHub API before triggering a build, but such a solution has potential access issues, race conditions, and rate limit implications.
In terms of the bypass conditions you mentioned, those are all scenarios in which commit verification isn't meaningful. For example:
HEAD commit - we're not building a specific commit, we're building whatever is at HEAD, we're not being told to build something specific.
Empty branch - if we're not given a branch name, then there are no branch-based conditions to exploit.
PR builds - If it's a PR build, there's no risk of pretending that it's on main.
There was a problem hiding this comment.
👍 I don't see another way out either for this particular defense so I have no objection. My main point is mainly around holistically if this prevention mechanism is meaningful enough when an attacker gained BK API access.
Could the attacker specify refspec (which bypass the protection) + branch at the same time? I didn't trace that code path so a bit unsure.
So not a blocker, mainly a question for my understanding.
There was a problem hiding this comment.
Yeah I got you - so what I would say is that there are two attack vectors here. The first is webhook manipulation - someone creates a branch and commits malicious code to it, then spoofs a webhook that includes that commit SHA, with branch: main. Refspec isn't part of this payload, so that concern isn't valid here. Without verification, this is enough for us to build a malicious commit with main permissions.
The second vector is what you're thinking about, BK API access. As I mentioned in my other comment, if someone compromises an API token, we have bigger things to worry about. They can do more damage than what verification is going to prevent. You're right that with that access they could bypass verification by sending a bogus refspec. But we have to skip verification in the custom refspec case, because we have to assume that main resolves to something meaningful locally, in order to check if a provided commit SHA legitimately exists on it or not. A custom refspec breaks that assumption, so we have to skip attempting verification or else risk false failures.
Let me know if all that makes sense!
| // Still 128 - full unshallow as last resort | ||
| e.shell.Commentf("Deepening insufficient, performing a full unshallow...") | ||
| _ = e.shell.Command("git", "fetch", "--unshallow").Run(ctx) |
There was a problem hiding this comment.
I wonder if the cost outweigh the gain here? Now the equation become, if someone put a random commit SHA in the build api, our agent will basically do a repo clone -> which depending on situation can take a long time.
An realistically, building for a old commit in CI is pretty rare 🤔
There was a problem hiding this comment.
I'm not sure what you're suggesting - do you think we should never go through the unshallow step, because it will make builds slower if an attacker sends a random commit via the build API? If this person already has API access, we have bigger things to worry about. The more likely attack scenario is someone putting a legitimate malicious commit on a branch and tricking us into building it as though it was on main (i.e. the pipeline runs with main branch permissions, could deploy, etc.).
You're thinking about the legitimate scenario of building an "old" commit where the "deepen by 50 commits" might not be enough. But in a busy monorepo, a rebuild from even a day ago could be more than 50 commits behind, so we'd have to unshallow to do the verification. A user is going to expect that. I'm just trying to avoid scenarios where we block legitimate builds.
There was a problem hiding this comment.
Ah soz I should be clearer, I think there is pretty big jump between check 50 commits to a full repo scan.
I wonder if it's an legitimate commit in a busy repo, how likely will we trigger the full repo scan.
This isn't a blocker, just something I am trying to understand.
There was a problem hiding this comment.
Got it! Yeah it's a tradeoff, and I did think about what the right balance was. First, this only matters for people using shallow clones, which AFAIK is a small percentage of current users. Second, if you do shallow clones on a repo that isn't very busy, deepen 50 is probably going to be enough - especially for people who are doing something reasonable like depth 10 rather than depth 1. It's only very busy branches where, if I understand your concern, deepen 50 might not be enough, but a full unshallow would be expensive. In that scenario you might wish for an intermediate deepen 250 step or something, before the full unshallow. Is that what you have in mind? If so, I think what I'd suggest is that we stick with the current solution and see how it goes. If this starts to become a problem, a potential solution would be to make the deepen amount configurable, instead of hard-coded to 50 commits. So for a busy repo, if this keeps coming up, they could change it to a value that suits them.
There was a problem hiding this comment.
👍 happy to give this a shot.
|
Another thing we raised in the Agent Office hour is the big feature branch approach. We recommend targeting the main branch: 1. to derisk the big bang deployment. 2. V4 is happening in parallel, we worry that you would hit non-trivial conflicts if we hold a long running feature branch. Oz and Ben might be able to share a bit more context, they were on the meeting. What do you think? |
|
@zhming0 yeah I'm catching up on our internal thread about that topic. I was just following the example we were already using for this work, but since it sounds like folks would rather have us merge to main in discrete chunks, I'm fine with that. |
Description
Adds commit-on-branch verification to the checkout phase.
As part of our agent checkout improvement work, we are adding support for git commit verification. This addresses a potential security issue: if an attacker adds a malicious commit to the repo, and is then able to trigger a build which specifies
commit: 3v1l5ha123branch: mainwe would build that commit as if it were on themainbranch, without verifying that this is the case, which could potentially lead to a production deployment of malicious code.This change introduces a
BUILDKITE_GIT_COMMIT_VERIFICATIONenvironment variable, which defaults tofalse. If set totrue, when we are given a commit and branch, before checkout and build, we do the following:git merge-base --is-ancestorto check if the commit really belongs to the specified branch.unshallowand check again.In terms of alternatives, the only change I considered was just going straight to
unshallowbefore verifying any commit. But I didn't want users to have to completely give up shallow commits to get verification, so I went with the "deepen first, then unshallow if necessary" approach.Context
See SUP-6535.
Changes
Adds a
BUILDKITE_GIT_COMMIT_VERIFICATIONenv var to enable to functionality. When set, we perform various git operations in between fetch and checkout, to make sure the provided commit is valid on the provided branch.Testing
go test ./...). Buildkite employees may check this if the pipeline has run automatically.go tool gofumpt -extra -w .)Disclosures / Credits
I used Claude to help me plan the feature and teach me some Go fundamentals. I coded the core functionality myself, and let Claude write the tests.